A lexicon for biology and bioinformatics: the BOOTStrep experience

نویسندگان

  • Valeria Quochi
  • Monica Monachini
  • Riccardo Del Gratta
  • Nicoletta Calzolari
چکیده

This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and text mining in the biomedical domain. The BioLexicon is a large-scale lexical-terminological resource encoding different information types in one single integrated resource. In the design of the resource we follow the ISO/DIS 24613 “Lexical Mark-up Framework” standard, which ensures reusability of the information encoded and easy exchange of both data and architecture. The design of the resource also takes into account the needs of our text mining partners who automatically extract syntactic and semantic information from texts and feed it to the lexicon. The present contribution first describes in detail the model of the BioLexicon along its three main layers: morphology, syntax and semantics; then, it briefly describes the database implementation of the model and the population strategy followed within the project, together with an example. The BioLexicon database in fact comes equipped with automatic uploading procedures based on a common exchange XML format, which guarantees that the lexicon can be properly populated with data coming from different sources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BioLexicon: A Lexical Resource for the Biology Domain

Natural language processing technologies have advanced remarkably in the past two decades. However, biological terminology is a frequent cause of analysis errors when processing literature written in the biology domain. The BOOTStrep BioLexicon is a linguistic resource tailored for the domain to cope with these problems. It contains the following types of entries: (1) a set of terminological ve...

متن کامل

The Value of an in-Domain Lexicon in genomics QA

This paper demonstrates that a large-scale lexicon tailored for the biology domain is effective in improving question analysis for genomics Question Answering (QA). We use the TREC Genomics Track data to evaluate the performance of different question analysis methods. It is hard to process textual information in biology, especially in molecular biology, due to a huge number of technical terms w...

متن کامل

In-silico study to identify the pathogenic single nucleotide polymorphisms in the coding region of CDKN2A gene

Background: CDKN2A, encoding two important tumor suppressor proteins p16 and p14, is a tumor suppressor gene. Mutations in this gene and subsequently the defect in p16 and p14 proteins lead to the downregulation of RB1/p53 and cancer malignancy. To identify the structural and functional effects of mutations, various powerful bioinformatics tools are available. The aim of this study is the ident...

متن کامل

Bioinformatics Designing of 10-23 Deoxyribozyme against Coding Region of Beta-galactosidase Gene

Background: Deoxyribozymes (Dzs) can play a role as gene expression inhibitors at mRNA level. Among Dzs, the 10-23 deoxyribozyme has significant potentials for treatment of diseases. Designed Dz includes a catalytic core made of 15 deoxyribonucleotides and two binding arms consisted of 6-12 nucleotides for site specific binding to target RNA and hydrolysis. The enzyme has characteristic feature...

متن کامل

A comprehensive in silico analysis of pathogenic nsSNPs in the NT5C2 gene involved in relapsed ALL

Background: About 10-20% of children suffering from acute lymphoblastic leukemia (ALL), experience a relapse, which is a major cause of their death. Purine nucleotide analogs are frequently prescribed to maintain the treatment of ALL. Cytosolic 5´-nucleotidase (NT5C2) catalyzes the 5´ dephosphorylation of purine analogs. Gain-of-function mutations in the NT5C2 gene result in resistance to the t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008